Atomic-detailed milestones along the folding trajectory of protein 

G 



C. Camilloni\ G. Tiana^'^ and R. A. Broglia^'^'^ 

^Department of Physics, University of Milano, 
via Celoria 16, 20133 Milan, Italy. 
"^INFN, Milan Section, Milan, Italy 
^The Niels Bohr Institute, University of Copenhagen, 
Blegdamsvej 17, DK-2100 Copenhagen, Denmark 

Abstract 

The high computational cost of carrying out molecular dynamics simulations of even small-size 
proteins is a major obstacle in the study, at atomic detail and in explicit solvent, of the physical 
mechanism which is at the basis of the folding of proteins. Making use of a biasing algorithm, based 
on the principle of the ratchet-and-pawl, we have been able to calculate eight folding trajectories 
(to an RMSD between 1.2A and 2.5A) of the Bl domain of protein G in explicit solvent without the 
need of high-performance computing. The simulations show that in the denatured state there is a 
complex network of cause-effect relationships among contacts, which results in a rather hierarchical 
folding mechanism. The network displays few local and nonlocal native contacts which are cause 
of most of the others, in agreement with the NOE signals obtained in mildly-denatured conditions. 
Also nonnative contacts play an active role in the folding kinetics. The set of conformations 
corresponding to the transition state display (^-values with a correlation coefficient of 0.69 with 
the experimental ones. They are structurally quite homogeneous and topologically nativelike, 
although some of the side chains and most of the hydrogen bonds are not in place. 



Molecular-dynamics simulations in explicit solvent can be a very useful complement to 
experimental studies of protein folding, in keeping with the fact that they provide insight into 
the time evolution of the process with atomic detail, under fully controlled conditions pQ. On 
the other hand, they are computationally very demanding, even in the case of small proteins. 
Among the most massive folding simulations ever realized is a 10/is molecular dynamics 
(MD) folding trajectory of the 38-residue WW domain, lasting for about 3 months on 329 
cores and reaching conformations which are ~ 50% similar to the native conformations in 
terms of number of contacts [2j. To be statistically sound, Pande and coworkers carried out 
410 simulations of the folding of the 35-residue Villin Headpiece, the average duration being 
863 ns. The calculation lasted for 54 machine years on a distributed computer, and eighteen 
of these trajectories reached the native conformation |3]. 

The intrinsic and unavoidable computational problem in carrying out folding simulations 
with realistic protein models is the wide range of time scales involved: the time step of the 
simulation must be tuned to femtoseconds, corresponding to the time scale of atomic vibra- 
tions, while the overall folding process spans over interval of time ranging from milliseconds 
to seconds. In an attempt to overcome this difficulty, a number of investigations focused on 
the study of unfolding simulations at high temperature [U [5] . 

A decade ago Marchi and Ballone developed an adiabatic bias molecular dynamics 
(ABMD) method |6j, to generate MD trajectories between pairs of points in the confor- 
mational space of complex systems. It was applied for the first time to protein unfolding by 
Pad and Karplus [7] . The method is based on the introduction of a biasing potential which 
is zero when the system is moving towards the desired arrival point and which damps the 
fiuctuations when the system attempts at moving in the opposite direction. As in the case 
of the ratchet and pawl system, propelled by thermal motion of the solvent molecules, the 
biasing potential does not exert work on the system. Consequently, the resulting trajecto- 
ries are physically correct. On the other hand, the algorithm cannot provide the statistical 
weight of the visited states nor the time scales associated with the trajectory. 

In the present work we report on results of the application of the ABMD algorithm to 
the study, with the help of the Amber force field p] in explicit solvent and without recurring 
to high-performance computing, of the folding of the 56-residues Bl domain of protein 
G starting from 16 thermally-unfolded conformations. From the eight trajectories which 
reached a RMSD lower than 2.5A we have extracted the conditional probabilities of contact 
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formation between the amino acids of the protein. From them it is possible to learn whether 
there are obligatory steps along the folding pathway of the protein. 



I. RESULTS 



Sixteen ABMD simulations were carried out starting from uncorrelated high-temperature 
protein conformations in water, drived by the distance dcM of the contact map of a given 
protein conformation from the native contact map (cf. Eqs. 2]|3). Eight of these simulations 



fold within an RMSD of 2.5A from the crystallographic conformation within the simulated 
time of 50 ns (see Fig. [T]). Two of the folding trajectories reach conformations within 
an RMSD of 1.2A, and other two within 1.4A. The other four folding trajectories display 
imperfections due to non-native alignment of some side-chains, a result which is fostered by 
the biasing algorithm and which is anyway compatible with the predicted glassy dynamics 
of side-chains in the native state [H] . These imperfections cause the RMSD to reach values 
up to 2.5A, even if the overall topology was correct. 

Concerning the eight non-folding trajectories, three of them display the two hairpins 
docked on the wrong side with respect to the helix; two trajectories display the hairpins 
undocked but with the orientation of the side chains which is symmetrical with respect to the 
native conformation. These misfolded conformations are reached because the definition of 
the reaction coordinate dcM employed to drive the simulation involves mainly the Ca atoms 
(which are 56) and only 12 atoms belonging to the hydrophobic side chains. Consequently, it 
is not always effective in discouraging the formation of conformations with a wrong symmetry 
of the side chains. The structure of the protein in the remaining three trajectories does not 
display misfolded features and seems only to need longer simulation times to reach the native 
state. 



A. The kinetics of contact formation is rather hierarchic 

The only information which ABMD simulations can provide concerns the sequence of 
events of the folding process (which follows what). The order of formation of the 110 native 
contacts of the protein are calculated for each trajectory, defining the quantity t{i, k) as the 
(nominal) time at which the ith contact is stably formed in the kth simulation. From it one 
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can define the probability 

1 ^ 

Mi, = -Y,Ht{hk)-t{j,k)), (1) 

k=l 

that the ith contact is formed before j, where 6 is the Heaviside's step function. This 
matrix satisfies Mij + Mji = 1 and each element Mij assumes the value 1 if the formation 
of the ith contact precedes the formation of the jth, if it follows it, and 1/2 if the two are 
uncorrelated. A related quantity to Mij is the probability Aj = ^ij/^^^ ^^at the jth 
contact is formed after any other contact. 

The plot of the Aj, ordered from the smallest to the largest values (cf. Fig. SI in the 
Supplementary Materials), can be used to study to which extent the formation of contacts 
during folding is hierarchic. If, during protein folding, native contacts are formed along a 
deterministic hierarchy of events, as in a chain of chemical reactions, the ordered Aj values 
will lay on the diagonal of the plot. On the contrary, if folding is fully cooperative, in the 
sense that all native contacts are formed simultaneously at the transition state, the ordered 
Aj values will lay on a horizontal line. One can thus define a parameter hi to measure the 
degree of "hierarchicity" of the folding process, proportional to the angular coefficient of 
the ordered Aj values, ascribing to it the value 1 in the case of a deterministic hierarchy. 
One should notice that a hierarchical folding kinetics is not incompatible with a sharp 
thermodynamic transition from denatured to native state at equilibrium, something which 
can be produced, for example, by a large free-energy barrier at one step of the hierarchy. 

The value of hi associated with the simulation is 0.65, indicating a fairly hierarchical 
process. One should notice that a large value of hi is not in disagreement with the sharp- 
ness of the thermodynamic transition between the denatured and the native state, as this 
transition usually involves a small subset of native contacts in keeping with the fact that 
a number of these contacts are already formed in the denatured state |12J. As a control, 
a random matrix satisfying the M^j + Mji = 1 requirement displays hi = 0.08. A further 
control case is that of homopolymeric chains whose rate of contact formation depends only 
on the distance along the chain of the two residues involved (see Supplementary Materials). 
This model provides a. hi = 0.27, reflecting the hierarchy arising from the straightforward 
fact that residues which are close along the chain build out contacts faster than those which 
are far apart along it. 

The curve associated with the folding simulations display six contacts with particularly- 
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low value of Aj. These contacts are between pairs of residue 22-25, 29-32, 34-38 and 35-38 
(within the helix), 46-49 and 43-54 (within the second hairpin) and 7-54 (between the N- 
and the C-terminal strands of the protein). Because the presence of such an early non-local 
contact is entropically quite unfavorable, its presence underscores its essential role in the 
whole folding process, strongly restricting the possible conformations of the chains. It is 
not stabilized by hydrogen bonds or salt bridges, but takes place between two hydrophobic 
residues (L7 and V54), L7 being close to other two hydrophobic residues (L5 and 16) and 
belonging to the eventual hydrophobic core of the protein G. 

The contacts displaying the largest probability of being formed after all the others have 
done so, belong mostly to the interface between the two hairpins and between each of 
them and the helix. Interestingly, three of these late-forming contacts are between residues 
stabilizing the first hairpin, namely 9-13, 7-16 and 4-15. 

B. The folding hierarchy involves three levels of contact formation 

To inspect the hierarchy of contact formation, we report in Fig. [3]the set of native contacts 
with respect to the probability Aj that the contact follows the other contacts (and should not 
be confused with a temporal the present simulation cannot give information in this 

respect). Thirteen contacts, boxed with a red square in the figure, are not the consequence 
of the formation of any other contact, but are themselves the cause of the formation of other 
contacts (i.e., are contacts i such that Mjj = 1 for all j's, and Mji = for some i). The 
residues building such contacts are highlighted in red in Figs. |2](A) and (B). They are mostly 
local contacts within the second hairpin and within the helix. Exception is made for contact 
39-54 between the helix and the second hairpin and the non-local contacts 5-30 and 7-54 
between the first hairpin and the helix and between the two hairpins, respectively. 

The contacts boxed in blue in Fig. [3] are those which always follow other contacts and 
never cause them. They involve all secondary and tertiary structures of the protein, indi- 
cating that no part of the protein has completely reached its native conformation before 
the overall folding. Essentially each of the red contacts is linked to some blue contact by a 
cause-effect relationship (i.e., Mij = 1), indicating that there are two well-defined layers in 
the hierarchy of contact formation. 

The five contacts squared in black constitute an intermediate layer which always follow 
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the formation of contacts 22-25 and 29-32 within the hehx, and are themselves cause of the 
formation of many contacts distributed throughout the protein (the gray arrows join these 
contacts j with the contacts i such that Mij = 1 or Mji = 1). 

The contacts not marked by squares are those displaying fractional probabilities to be the 
cause or the consequence of the formation of some other contact. These contacts are mostly 
concentrated within the first hairpin, and to a lesser extent between the two hairpins. 

Summing up, the simulation indicates that the folding mechanism of protein G involves 
first the spontaneous formation of native contacts in the second hairpin, in the helix and few 
non-local contacts. The stabilization of the first hairpin and the formation of most tertiary 
contacts come only as a consequence of these events. 

C. A small number of non native contacts are formed with probablity one 

Operatively, a non-native contact is assumed established when two amino acids lying 
more than three residues apart along the chain and further away than 5A in the native 
conformation found themselves come, in the folding process, closer than 4A. There are 9 
non-native contacts which are formed for some time in all the folding trajectories; these 
contacts display well-defined behaviours. The contacts E19-A26 and V21-A26 stabilize 
a non-native turn (hydrophobic staple motif HH [TJ]) in the region between the first 
/3-hairpin and the a-helix; this turn is disrupted when the first N-terminal turn of the 
helix is formed. Residues K31 and D40 form a non-native salt-bridge which stabilizes the 
C-terminal segment of the helix. When the helix forms its tertiary interactions with the 
hairpin, the salt bridge is broken and the side chain of K31 gets reoriented to interact with 
the hairpin. 

Contacts Y33-D40 and N35-G41 are always formed in the initial collapse of the chain. 
Contacts V39-T55 and D40-T55 have a bizarre behaviour. They are formed after the C- 
terminal part of the helix, when no other native contact between the helix and the second 
hairpin are formed. Their disruption is followed by the formation of the native contacts in 
the immediate vicinity (e.g. 39-56, 40-56), which substitute such non-native contacts in 
the docking between the helix and the second hairpin. Consequently, they seem to act as 
baits to entice together the two secondary structures of the protein to come togheter. 
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D. The transition state ensemble is structurally homogeneous and displays a very 
native like topology 

The ABMD simulations are not able to directly identify where the transition state (TS) 
lies. On the other hand, since protein folding is associated with the crossing of a free-energy 
barrier, one expects the TS to be associated with a marked jump in the reaction coordinate 
of the system. While the correct reaction coordinate is unknown, dcu has proven effective 
in ratcheting the protein to its native conformation, thus showing to be correlated with it. 
Consequently, the assumption is made that the transition state is located close to the last 
jump in dcM before the native state (cf. Fig. S2 in the Supplementary Materials). 

To test the validity of this assumption we have carried out a commitment analysis [16] on 
the first two folding trajectories. Starting from (twelve, 7 from the first trajectory and 5 from 
the second, respectively) different conformations chosen in the neighbourhood of the putative 
transition state, 20 MD simulations have been carried out for 10 ns each (see Materials and 
Methods). These calculations corresponds overall to 2.4/is of MD. The fraction Pfoid of 
folding simulations as a function of the RMSD of the a-helix and of the second /5-hairpin is 
displayed in Fig. |4] and shows a monotonic behaviour, as expected for the transition state. 
The conformations displaying pjoid = 0.5 correspond, by definition, to the transition state. 
The most representative among the selected as initial TS conformations for each of the first 
two trajectories (corresponding to times 20.820 ns and 37.360 ns, respectively) are displayed 
in Fig. |2[C) and (D). 

The TS conformations found in the two independent runs are quite similar to each other, 
displaying 54% of common formed native contacts. This result suggests that the transition- 
state ensemble is structurally rather homogeneous. Moreover, their topologies are very 
native-like, the RMSD calculated on the Ca atoms being 3.6A and 3.0A, respectively. Look- 
ing at the secondary and tertiary structures in more detail reveals less ordered features; in 
fact, the fraction of native contacts in the two TS conformations, defined as in the preceding 
Sections, are 36% and 62%, respectively. The contacts within secondary structures are well 
formed (81% of the first hairpin, 94% of the helix and 69% of the second hairpin), while 
those stabilizing tertiary structures are less so (51% of contacts a — (31, 33% of contacts 
a — f32 and 57% of contacts (31 — (32). The missing native contacts are mainly the hydrogen 
bonds associated with the hairpins (both intra- and inter-hairpin) and the contacts be- 
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tween the side chains of the hehx and the two hairpins. In other words, the overall topology 
is native-like, while the detailed geometry of secondary structures and the packing of the 
buried side chains is not yet optimized. 

The contacts which are not formed in both the transition state conformations are dis- 
played in Fig. |3] within dotted ellipses. Most of them belong to the set of contacts which 
always follow other contacts and never cause them (marked in blue in the figure), except for 
contact 28-32. 

E. Calculation of values from the folding trajectories 

The knowledge of the structure of the protein in the transition state allows to calculate the 
associated yj-values and thus be able to make a comparison with the experimental findings 
|17j . The yj-values calculated for the two transition-state conformations displayed in Figs. 
[2p and D have correlations with the experimental data of ref. [18] of only 0.25. 

Indeed, experimental (yj-values consist of an average over a very large number of molecules. 
Although the transition state obtained from ABMD simulations is topologically quite ho- 
mogeneous, yj-values are quite sensitive to the detailed atomic environment of the mutated 
side chain, resulting in a nontrivial distribution. Using as transition state ensemble the two 
transition-state conformations and the six putative conformations obtained in the previous 
Section, an average set of (/^-values was calculated and displayed in the lower panel of Fig. 
|4| in comparison with the experimental values. The correlation coefficient is now 0.69. Of 
notice that these results shed light on the power of the yj-value analysis of folding, not only 
as a valuable characterization of the transition state, but also as the specific probe, at the 
level of single atoms, of the conformations associated with the top of the folding free energy 
barrier. 

II. DISCUSSION 

With the help of the ABMD algorithm it is possible to simulate, in few days on a PC, 
the folding of a small-size protein (as e.g. protein G) with atomic detail and in explicit 
solvent making use of realistic force-fields. One can generate multiple folding trajectories, 
and consequently obtain a statistically representative information concerning the kinetics of 
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the folding process. The price to be paid for reaching this goal, is that only the simulated 
succession of events is physically meaningful, while the statistical weights of the visited 
states and the time scales obtained from the simulations are not. 

The succession of contact formation has been analyzed based on the assumption that 
the probability Mij that the formation of the jth contact follows the formation of the zth 
contact in the folding simulations reflects a cause-effect relationship between the two events. 
The resulting picture corresponds to a rather hierarchical set of contact formations. Few 
native contacts are always formed spontaneously, without the need of any other event to 
happen before. The formation of these contacts is, according to the results of the calcu- 
lations summarized in Fig. |3| a necessary condition for the formation of the contacts at 
the transition state. Of notice that in this state are also formed essentially all contacts 
which are neither cause nor effect of other contacts (those not colored in Fig. |3]), and few 
of those which are effect of the formation of other contacts (marked in blue in the figure). 
Contacts which are only cause of other contacts are mainly local, located within the helix 
and the second hairpin, while only three of them are non-local (i.e., 5-30 between the first 
hairpin and the helix, 39-54 between the helix and the second-hairpin and 7-54 between the 
two hairpins). It is not unexpected that the residues building out these early-contacts are 
concentrated in the helix and in the second hairpin (cf. Figs. [2]A. and B), which are known 
by (y9-value analysis to be structured in the transition state |T8] . Interestingly the non-local 
contact between residues 7 and 54, linking the two terminals of the protein, agrees with the 
interpretation of the effects of mutations in the first strand of protein G, according to which 
hydrophobic residues in the first strand do make some interactions with the relatively ordered 
second beta-hairpin [T8] . 

Contextualizing the above results within the framework of a two-state scenario of the 
folding of protein G, one can argue that the formation of the non-local contacts (5-30, 
39-45 and 7-54) take place in the early stages of refolding (see also ref. [U E]). Within 
this context one can mention that, NMR spectra of the pH-denatured state of protein G 
highlight elements of native structure associated with the second hairpin, with the helix 
and with the turn of the first hairpin [19j. This interpretation is consistent with results of 
simplified models showing the formation of local elementary structures (LES) [9l [TOl |20], or 
foldons [21], in the denatured state under native conditions, and their docking into a folding 
nucleus [22] • Within this picture, the docking of the LES -which can be viewed as hidden 
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incipient secondary structures but most likely lacking of a number of hydrogen boding as 
well as of side chains contacts as compared to the native situation- seems also helped by 
the early formation of few non-local native contacts and few non-native contacts. The 
transition state is compatible with the docking of the LES, corresponding to a ensemble of 
conformations where the protein display its native topology. The end of the folding process 
involves the detailed packing of the buried side-chains [HI [23] and the stabilization through 
hydrogen bonds of secondary structure elements [23] . 

The non-native interactions seem to play two roles. First, they stabilize the helix while 
the system attends the formation of the native tertiary interactions. Moreover, they help the 
folding kinetics, attracting residues which are distant along the chain for then leaving the 
place to native nearby interactions. This fact suggests that evolution could have selected, 
at the price of protein stability, sequences not only optimizing native interactions, but also 
specific non-native ones able to enhance folding kinetics. 

The Bl domain of Protein G folds in a time which is as fast as 5 ms [21], making the 
experimental characterization of the events which take place along the folding pathway, quite 
difficult. Aside from validating the model, the possibility of characterizing structurally the 
transition-state ensemble from ab initio calculations offers an unprecedented opportunity 
to complement experimental data. Therefore, we compare our results with, on one hand, 
the characterization of the denatured and of the TS state, and on the other hand, with 
computational results obtained making use of simplified models. Also with those resulting 
from studies of selected fragments of protein G. 

The NMR analysis of protein G under mildly denaturant conditions carried out by Sari 
and coworkers [19] indicate that short-range contacts between Ha and in the turn of the 
first hairpin (contact 8-12), in the helix (22-25, 22-26, 29-32, 34-38) and the long-range 
contact 41-55 are formed in the denatured state. These contacts match remarkably well 
with the "early contacts" marked in red in Fig. [3j Moreover, the J-couplings associated 
with residues 44, 49, 51 and 52 indicate that the second hairpin populates the beta region 
of the Ramachandran plot. Consistently with our results, this suggests that parts of the 
helix are formed, that the second hairpin displays a native-like topology, constrained by the 
contacts 39-54 and 41-55. Also, the result of the simulations is not inconsistent with the 
early formation of the turn in the first hairpin. 

The topology of the transition state is more similar to the native state than what ex- 
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periments usually suggest. This was already noted in ref. [23l [251 126], but on the basis of 
data-driven calculations. The difference between the transition and the native state seems 
to be not in the amount of native contacts, meant mainly as Van der Waals interactions, 
but in their degree of optimization and in the formation of orientation-dependent H-bonds. 
For example, the first hairpin is quite native-like in the transition state, but it does not 
qualify as a beta-hairpin in terms of detailed dihedral angles and H-bonds pattern. Simi- 
larly, it is likely that the native structures present in the denatured state are not textbook 
secondary structures, and consequently escape characterization with traditional tools. The 
subtle structural difference between the transition state and the native state is likely to be 
the reason why simplified protein models with reduced degrees of freedom overstimate the 
(y9-values [20]. Also the reason why y9-value analysis provide a very refined microscope at 
the all-atom level of the transition state. 

III. MATERIALS AND METHODS 

A. Model system 

The structure of the Bl-domain domain of the IgG-binding domain of streptococcal 
protein G we used has pdb code Ipgb ^7\. All the simulation are performed with GRO- 
MACS [28j. The interactions are described by the Amber 2003 all-atom force-field ported 
to Gromacs [HI [29]. The system is enclosed in a dodecahedric box of 261 nm^ with peri- 
odic boundary conditions and solvated with 8325 SPG water molecules. The system charge 
was neutralized adding 4 Na"*" ions. Van der Waals interactions are cut-off at 1.4 nm and 
the long-range electrostatic interactions were calculated by the particle mesh Ewald algo- 
rithm [30] with a mesh spaced of 0.125 nm. The system evolve in the canonical ensemble, 
coupled with a Nose-Hoover thermal bath |311 [32] . The native state is first thermalized at 
300 K for 1 ns. A 2 ns dynamics at 300 K at constant volume used to generate a reference 
native state ensemble. 

B. Initial conformations 

To generate a set of unfolded conformation we run a 26 ns long simulation at 600 K, 
we took 16 conformations from the 10th to 26th ns and we thermalized them at 300K 
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for 2 ns each. The RMSD calculated on the Ca atoms between the 16 structures after 
the thermalization is between 0.8 to 1.4 nm which guarantee that the dynamics will be 
uncorrelated. 



C. Adiabatically biased trajectories 

The trajecories are generated by a biased molecular dynamics algorithm proposed by 
Marchi and Ballone [6] and applied to proteins by Paci and Karplus [7]. The driving coor- 
dinate used in the present study is the distance dcM of the contact map of a given protein 
conformation from the native contact map, introduced by Bonomi at al. in ref. |33j. This is 
defined as 

j>i+2 J 

were C,,- is the i,j element of a NxN matrix defined as 



C^Ar^,) = { '-^^^ (3) 

Tij is the distance between atom i and j and C is the defined on the native state. The 
parameters used in these simulations are p = 6, g = 10, tq = 0.75 nm and Vcut = 1-23 nm. N 
include all the a carbons and either the f3 or the 7 carbons of the hydrophobic side-chains. 
The biasing potential is implemented as 

V{p{t)) = { (4) 
0, p{t)<pUt), 



where 



and 



pit) = idcMit)-Of (5) 



Pm{t) = min p(r). (6) 

o<r<r 

The first tests to chose a proper collective variable were done with a constant a of 40 
KJ/mol, while for the production runs the constant is set to 3 K Joule/mole. Each of the 
sixteen unfolded structure is evolved for 50 ns. 
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D. Contact analysis 



A contact between two amino acids is defined if 1) there is a H-bonds between the two 
amino acids, that is a polar H and an O are closer than 2. 5 A and their respective bonds are 
aligned with a maximum deviation of 30 deg, or 2) the minimum distance between any atom 
in their side-chain is less than 4A. Native contacts are defined if the above property holds 
for the average distances calculated on the native state ensemble. Having also calculated the 
standard fluctuations of the atom distances in the native state, a native contact is defined 
as stably formed once it is formed and, since then, its fluctuations do not exceed the double 
of those found in the native state. A non-native contact between two amino acids is defined 
if 1) the minimum distance between any atom is less than 4A and, 2) the mean minimum 
distance between any atom is more than SA on the native state ensemble. 
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FIG. 1: The RMSD calculated on the Ca atoms along the 8 folded trajectories 




FIG. 2: (A) and (B) The native conformation displayed from two different points of view, with the 
residues building contacts which are not the consequence of any other contact displayed as cartoons, 
and their sidechain drawn with balls and sticks. (C) and (D) The transition states obtained from 
the first two trajectories, respectively. 
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FIG. 3: From the matrix Mjj obtained from the folding simulations one can obtain the hierarchy of 
contact formation. The plot shows the native contacts in a plot where the horizontal axis indicate 
the probability Aj that the jth contact is formed last, while the vertical axis indicates which regions 
of protein G it involves. Contacts marked with a red square are those which are always cause and 
never consequence of the formation of other contacts; on the contrary, contacts marked with a 
blue square are those which are always consequence and never cause. Contacts in a black square 
are both cause and effect of other contacts. The arrows indicate a cause-effect relationship. Since 
the simulation shows a cause-effect relationship between essentially any red contacts to any blue 
contact (as suggested by the curved arrows), the arrows are explicitly indicated only for the gray 
contacts. The colored areas are meant to guide the eye in connection with contacts belonging to 
contiguous regions of the protein. Dotted ellipses indicate those contacts which are not yet formed 
in any of the two transition state conformations displayed in Figs. W[C) and (D). 
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FIG. 4: (upper panel) The folding probability of MD simulations starting from conformations with 
different values of the RMSD between the helix and the second hairpin obtained by two different 
ABMD trajectories. The transition state corresponds to the conformations with folding probability 
equal to 0.5 (dashed line). The RMSD between the helix and the second hairpin has been chosen 
because results to be the region of the protein that is the least structured in the transition state 
(33% of native contacts formed, see text), (left panel, below) The ip-values obtained by simulations 
(dashed curve) and by experiments (solid curve). 
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